Recognizing Named Entities in Tweets

نویسندگان

  • Xiaohua Liu
  • Shaodian Zhang
  • Furu Wei
  • Ming Zhou
چکیده

The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semi-supervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semisupervised learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Embeddings for Both Entity Recognition and Linking in Tweet

English. The paper describes our submissions to the task on Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) at Evalita 2016. Our approach relies on a technique of Named Entity tagging that exploits both character-level and word-level embeddings. Character-based embeddings allow learning the idiosyncrasies of the language used in tweets. Using a full-blown Named Entity tagger al...

متن کامل

Identifying Tweets with Implicit Entity Mentions

ALEX, ADARSH. M.S., Department of Computer Science and Engineering, Wright State University, 2016. Identifying Tweets with Implicit Entity Mentions Social networking sites like Twitter and Facebook have become a significant source of user-generated content in the past decade. Mining of this user-generated content has proved beneficial for a broad range of applications like Event Extraction, Doc...

متن کامل

Knowledge-based Approach for Event Extraction from Arabic Tweets

Tweets provide a continuous update on current events. However, Tweets are short, personalized and noisy, thus raises more challenges for event extraction and representation. Extracting events out of Arabic tweets is a new research domain where few examples – if any – of previous work can be found. This paper describes a knowledge-based approach for fostering event extraction out of Arabic tweet...

متن کامل

Text normalization for named entity recognition in Vietnamese tweets

Background Named entity recognition (NER) is a task of detecting named entities in documents and categorizing them to predefined classes, such as person, location, and organization. This paper focuses on tweets posted on Twitter. Since tweets are noisy, irregular, brief, and include acronyms and spelling errors, NER in those tweets is a challenging task. Many approaches have been proposed to de...

متن کامل

NERTUW: Named Entity Recognition on Tweets using Wikipedia

We propose an approach to recognize named entities in tweets, disambiguate and classify them into four categories namely person, organization, location and miscellaneous using Wikipedia. Our approach annotates the tweets on the fly, ie, it does not require any training data.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011